HDD: a hypercube division-based algorithm for discretisation

نویسندگان

  • Ping Yang
  • JiSheng Li
  • YongXuan Huang
چکیده

This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the underlying correlation structure in the data set. It tries to find a minimal set of cut points, which divides the continuous attribute space into a finite number of hypercubes, and the objects within each hypercube belong to the same decision class. Finally, tests are performed on seven mix-mode data sets, and the C5.0 algorithm is used to generate classification rules from the discretised data. Compared with the other three well-known discretisation algorithms, the HDD algorithm can generate a better discretisation scheme, which improves the accuracy of classification and reduces the number of classification rules. 1. Introduction With the rapid development of computer and internet technologies, the amount of data and information grows exponentially. Since data mining is an extremely powerful approach to extract useful knowledge from large databases, it has become a research focus in Many data mining algorithms require that the training data contains only discrete attributes. In practice, however, a large number of attributes are of a continuous nature. In order to use these algorithms, the continuous attributes must first be discretised. This demands studies on appropriate discretisation methods. Discretisation is a process that divides the values domain of a continuous attribute into a small number of intervals, where each interval is mapped to a numerical, discrete value. After discretisation, data can be reduced and simplified. Thus, results obtained through decision trees or induction rules are usually more compact, shorter and more accurate than results derived using continuous values. Discretisation, as …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Hypercube Bivariate-Based Key Management for Wireless Sensor Networks

Wireless sensor networks are composed of very small devices, called sensor nodes,for numerous applications in the environment. In adversarial environments, the securitybecomes a crucial issue in wireless sensor networks (WSNs). There are various securityservices in WSNs such as key management, authentication, and pairwise keyestablishment. Due to some limitations on sensor nodes, the previous k...

متن کامل

USING LATIN HYPERCUBE SAMPLING BASED ON THE ANN-HPSOGA MODEL FOR ESTIMATION OF THE CREATION PROBABILITY OF DAMAGED ZONE AROUND UNDERGROUND SPACES

The excavation damaged zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. In this paper, a methodology was examined for computing...

متن کامل

Combining radial basis function neural network and genetic algorithm to improve HDD driver IC chip scale package assembly yield

In recent years, the future trend of micro HDD driver IC for large capacity micro HDD is to become lighter, thinner, shorter and smaller. Among all the options available for micro HDD driver IC’s assembly, warpage is an important issue related to micro HDD driver IC manufacturability and reliability. The optimal packaging manufacturing process for driver IC for micro HDD is chip scale package (...

متن کامل

A Spanning Bus Connected Hypercube: A New Scalable Optical Interconnection Network for Multiprocessors and Massively Parallel Systems

A new scalable interconnection topology suitable for massively parallel systems called the Spanning Bus Connected Hypercube (SBCH) is proposed. The SBCH uses the hypercube topology as a basic building block and connects such building blocks using multi-dimensional spanning buses. In doing so, the SBCH combines positive features of both the hypercube (small diameter, high connectivity, symmetry,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Systems Science

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2011